• Bandwidth Control And Fairness In The Linux Scheduler 

      Ugedal, Odin (Master thesis, 2021)
      Samtidig som mer og mer av datasenterindustrien fokuserer på delt infrastruktur, er nødvendigheten av skikkelig ressursstyring og prioritering av forskjellige programmer med forskjellige prioriteringer blitt enda viktigere. ...
    • BTB-X: A Storage-Effective BTB Organization 

      Asheim, Truls; Grot, Boris; Kumar, Rakesh (Peer reviewed; Journal article, 2021)
      Many contemporary applications feature multi-megabyte instruction footprints that overwhelm the capacity of branch target buffers (BTB) and instruction caches (L1-I), causing frequent front-end stalls that inevitably hurt ...
    • Delay and Bypass: Ready and Criticality Aware Instruction Scheduling in Out-of-Order Processors 

      Alipour, Mehdi; Kumar, Rakesh; Kaxiras, Stefanos; Black-Schaffer, David (Peer reviewed; Journal article, 2020)
      Flexible instruction scheduling is essential for performance in out-of-order processors. This is typically achieved by using CAM-based Instruction Queues (IQs) that provide complete flexibility in choosing ready instructions ...
    • Dependence-aware Slice Execution to Boost MLP in Slice-out-of-order Cores 

      Kumar, Rakesh; Alipour, Mehdi; Black-Schaffer, David (Peer reviewed; Journal article, 2022)
    • Eliminating Unnecessary Broadcasts to Simplify Out-of-Order Instruction Scheduling 

      Caccialino, Marco (Master thesis, 2020)
      Over the past decades the industry has been pushing to find ways to achieve better performance and efficiency. The natural evolution of microarchitecture has introduced first the pipeline and then out-of-order superscalar ...
    • FIFOrder MicroArchitecture: Ready-Aware Instruction Scheduling for OoO Processors 

      Alipour, Mehdi; Kumar, Rakesh; Kaxiras, Stefanos; Black-Schaffer, David (Journal article; Peer reviewed, 2019)
      The number of instructions a processor's instruction queue can examine (depth) and the number it can issue together (width) determine its ability to take advantage of the ILP in an application. Unfortunately, increasing ...
    • Freeway: Maximizing MLP for Slice-Out-of-Order Execution 

      Kumar, Rakesh; Alipour, Mehdi; Black-Schaffer, David (Journal article; Peer reviewed, 2019)
      Exploiting memory level parallelism (MLP) is crucial to hide long memory and last level cache access latencies. While out-of-order (OoO) cores, and techniques building on them, are effective at exploiting MLP, they deliver ...
    • Impact of Microarchitectural State Reuse on Serverless Functions 

      Asheim, Truls; Ahmed Khan, Tanvir; Kasikci, Baris; Kumar, Rakesh (Chapter, 2022)
      Serverless computing has seen rapid growth in the past few years due to its seamless scalability and zero resource provisioning overhead for developers. In serverless, applications are composed of a set of very short-running ...
    • Mitigating Unnecessary Throttling in Linux CFS Bandwidth Control 

      Ugedal, Odin; Kumar, Rakesh (Chapter, 2022)
    • Probing the Armv8-A ISA for Hidden Instructions through Processor Fuzzing 

      Strupe, Fredrik (Master thesis, 2020)
      Samfunnets økende bruk av datamaskiner forsterker behovet for å kunne verifisere og kontrollere systemer som tas i bruk, som et tiltak for å forsikre seg om at systemene ikke inneholder sårbarheter eller bakdører. For ...
    • Shooting Down the Server Front-End Bottleneck 

      Kumar, Rakesh; Grot, Boris (Peer reviewed; Journal article, 2022)
      The front-end bottleneck is a well-established problem in server workloads owing to their deep software stacks and large instruction footprints. Despite years of research into effective L1-I and BTB prefetching, state-of-the-art ...
    • A Specialized BTB Organization for Servers 

      Asheim, Truls; Grot, Boris; Kumar, Rakesh (Chapter, 2022)
      Contemporary server applications feature massive instruction footprints stemming from deeply layered software stacks. These footprints far exceed the capacity of the branch target buffer (BTB) and instruction cache (L1-I), ...
    • The Mosaic IQ Microarchitecture: A Set-Associative Approach for Efficient Operand Wake-Up in OoO Cores 

      Baumann, Henrik Rambech (Master thesis, 2023)
      I de senere årene har fokuset på energieffektive prosessorkjerner ført til forskning på metoder for å maksimere ytelsen til mikrobrikker innenfor gitte energibudsjetter. I out- of-order-prosessorer er fleksibel ...
    • Twig: Profile-Guided BTB Prefetching for Data Center Applications 

      Ahmed Khan, Tanvir; Brown, Nathan; Sriraman, Akshitha; Soundararajan, Niranjan; Kumar, Rakesh; Devietti, Joseph; Subramoney, Sreenivas; Pokam, Gilles; Litz, Heiner; Kasikci, Baris (Chapter, 2021)
      Modern data center applications have deep software stacks, with instruction footprints that are orders of magnitude larger than typical instruction cache (I-cache) sizes. To efficiently prefetch instructions into the I-cache ...